-
Notifications
You must be signed in to change notification settings - Fork 642
CASSGO-11 Support vector type #1828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Have you tested it with protoV5? If not I would be glad to handle it with #1822. |
Hmm protocol v5 shouldn't matter much here, the encoding of the type itself is the same regardless of whether v4 or v5 is being used. |
joao-r-reis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking solid, I left a few comments. Also note that you left a few TODOs.
joao-r-reis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it 👍 added a couple of small comments
joao-r-reis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff 👍
marshal.go
Outdated
| } | ||
| if k == reflect.Array { | ||
| if rv.Len() != info.Dimensions { | ||
| return unmarshalErrorf("unmarshal vector: array with wrong size") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: maybe add the expected and provided sizes to the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
vector_test.go
Outdated
| {name: "smallint", cqlType: TypeSmallInt.String(), value: []int16{127, 256, -1234}}, | ||
| {name: "tinyint", cqlType: TypeTinyInt.String(), value: []int8{127, 9, -123}}, | ||
| {name: "duration", cqlType: TypeDuration.String(), value: []Duration{duration1, duration2, duration3}}, | ||
| // TODO(lantonia): Test vector of custom types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: a couple of TODOs in this test still
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| sleep 30s | ||
| - name: Integration tests | ||
| run: | | ||
| source ~/venv/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a reminder to remove the workflow changes from this PR before wrapping up (we'll have to potentially merge #1833 first) when we get a second +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this PR will wait for #1833. I will remove workflow changes form this PR, rebase and squash commits.
|
I'd be curious your thoughts on if implementing this after #1855 would've made it easier. |
@lukasz-antoniak can you explore this? I imagine that this implementation doesn't have much overlap with #1855 and I'd like to get it in before #1855 if this is the case but I'd like your thoughts on this @lukasz-antoniak If we do want to get this done before #1855 then we need a rebase on this PR and then I can review it again |
joao-r-reis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think we need to move the C* versions to 4.1.x and 5.0.x in CI so that we get coverage for these tests before we approve the PR to be merged
| }, | ||
| }, | ||
| { | ||
| "vector<vector<float, 3>, 5>", VectorType{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a couple of cases with more complex types like vector<map<uuid,timestamp>, 5> for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added two more - one with map, and other with frozen tuple.
worryg0d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me, but we still need a couple of folks to review this one just to ensure we didn't miss something.
helpers.go
Outdated
| Elements: fields, | ||
| } | ||
| } else if strings.HasPrefix(name, VECTOR_TYPE) { | ||
| names := splitJavaCompositeTypes(strings.TrimPrefix(name[:len(name)-1], VECTOR_TYPE+"(")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the splitCQLCompositeTypes and splitJavaCompositeTypes, the prefix trimming logic is repeated in multiple cases. It could be moved inside those funcs, which would make the code cleaner and reduce repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| func getCassandraType(name string, logger StdLogger) TypeInfo { | ||
| // Parses long Java-style type definition to internal data structures. | ||
| func getCassandraLongType(name string, protoVer byte, logger StdLogger) TypeInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a TODO to cover getCassandraLongType with tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now added TODO. This is tested indirectly in e.g. TestFunctionMetadata.
| case TypeCustom: | ||
| switch info.(type) { | ||
| case VectorType: | ||
| return marshalVector(info.(VectorType), value) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simplify this block of code by assigning the type to var so you eliminate the need for the second type assertion:
| case TypeCustom: | |
| switch info.(type) { | |
| case VectorType: | |
| return marshalVector(info.(VectorType), value) | |
| } | |
| case TypeCustom: | |
| switch t := info.(type) { | |
| case VectorType: | |
| return marshalVector(t, value) | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the switch statement appears to only handle the VectorType case without any default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested switch looks redundant.
Maybe something like this would be more suitable?
case TypeCustom:
if vector, ok := info.(VectorType); ok {
return marshalVector(vector, value)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| case TypeCustom: | ||
| switch info.(type) { | ||
| case VectorType: | ||
| return unmarshalVector(info.(VectorType), data, value) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
marshal.go
Outdated
| } | ||
| return buf.Bytes(), nil | ||
| } | ||
| return nil, marshalErrorf("can not marshal %T into %s", value, info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to mention which types are actually marshalable (e.g., array, slice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is in-line with others. Do we want to make an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually a messadges with the provided accepted types like in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
marshal.go
Outdated
| } | ||
| return nil | ||
| } | ||
| return unmarshalErrorf("can not unmarshal %s into %T", info, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| if isVectorVariableLengthType(info.SubType) { | ||
| writeUnsignedVInt(buf, uint64(len(item))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: These functions could use some documentation imo—particularly explaining how it works, the intended purpose, and what happens if info.SubType is not isVectorVariableLengthType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When dealing with fix-length types, serializing length of the element is not required. isVectorVariableLengthType returns true for all types that require serialization of their length. Not sure what comment should be appropriate, which does not just read the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation — that clarifies things. I was curios how data considered "Vector Variable Length" in this case. So if you like it, I suggest adding comment as following:
// isVectorVariableLengthType determines if a type requires explicit length serialization within a vector.
Variable-length types need their length encoded before the actual data to allow proper deserialization.
Fixed-length types, on the other hand, don't require this kind of length prefix.
Also, if there are no serialization requirements for fixed-length types, should TypeSmallInt and TypeTinyInt be removed from isVectorVariableLengthType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added your comment. Unfortunately for those types server expects the length also.
| switch elemType.Type() { | ||
| case TypeVarchar, TypeAscii, TypeBlob, TypeText: | ||
| return true | ||
| case TypeCounter: | ||
| return true | ||
| case TypeDuration, TypeDate, TypeTime: | ||
| return true | ||
| case TypeDecimal, TypeSmallInt, TypeTinyInt, TypeVarint: | ||
| return true | ||
| case TypeInet: | ||
| return true | ||
| case TypeList, TypeSet, TypeMap, TypeUDT, TypeTuple: | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't combine it in one return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that instead all of this cases could be single "if", or one case for all suitable types?
As I can see, it's structured by similar types for better readability, not sure that it should be merged into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readability is the reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just showing what you said no to 🤣
switch elemType.Type() {
case TypeVarchar, TypeAscii, TypeBlob, TypeText,
TypeCounter,
TypeDuration, TypeDate, TypeTime,
TypeDecimal, TypeSmallInt, TypeTinyInt, TypeVarint,
TypeInet,
TypeList, TypeSet, TypeMap, TypeUDT, TypeTuple:
return true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a discussion about { put always on new line in C* code base. Changed as you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't insisting, just sharing a visual.
| case TypeCustom: | ||
| switch info.(type) { | ||
| case VectorType: | ||
| return marshalVector(info.(VectorType), value) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested switch looks redundant.
Maybe something like this would be more suitable?
case TypeCustom:
if vector, ok := info.(VectorType); ok {
return marshalVector(vector, value)
}
| switch elemType.Type() { | ||
| case TypeVarchar, TypeAscii, TypeBlob, TypeText: | ||
| return true | ||
| case TypeCounter: | ||
| return true | ||
| case TypeDuration, TypeDate, TypeTime: | ||
| return true | ||
| case TypeDecimal, TypeSmallInt, TypeTinyInt, TypeVarint: | ||
| return true | ||
| case TypeInet: | ||
| return true | ||
| case TypeList, TypeSet, TypeMap, TypeUDT, TypeTuple: | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that instead all of this cases could be single "if", or one case for all suitable types?
As I can see, it's structured by similar types for better readability, not sure that it should be merged into one.
marshal.go
Outdated
| buf.Write(tmp) | ||
| } | ||
|
|
||
| func readUnsignedVint(data []byte, start int) (uint64, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readUnsignedVint -> readUnsignedVInt, it would be a bit more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already encVint so maybe let us leave it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be inconsistent with writeUnsignedVInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
marshal.go
Outdated
| buf.Write(tmp) | ||
| } | ||
|
|
||
| func readUnsignedVint(data []byte, start int) (uint64, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe it should be provided with unit tests, maybe even with fuzzing, to make sure that everything works properly.
Also, I think that it should be along with some documentation, to help developers dive into it faster.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is in native protocol, so we do not need to duplicate it: https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L392C23-L399C45. Method signature already indicates what is the purpose. I can add a test.
marshal.go
Outdated
| for i := 0; i < info.Dimensions; i++ { | ||
| offset := 0 | ||
| if isVectorVariableLengthType(info.SubType) { | ||
| m, p, err := readUnsignedVint(data, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start parameter of the readUnsignedVint func seem to be redundunt. Could move inside the func. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is but I would leave it as is when data will not represent just unsigned Vint, but longer slice. But if that is a problem, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding is kind of distracting. We can always add the parameter back when there's an actual requirement for it. I'd probably remove it if there is no use of it.
@joao-r-reis, I’d appreciate your thoughts on this when you have a moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding is kind of distracting. We can always add the parameter back when there's an actual requirement for it. I'd probably remove it if there is no use of it. @joao-r-reis, I’d appreciate your thoughts on this when you have a moment.
Agree
| if isVectorVariableLengthType(info.SubType) { | ||
| writeUnsignedVInt(buf, uint64(len(item))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation — that clarifies things. I was curios how data considered "Vector Variable Length" in this case. So if you like it, I suggest adding comment as following:
// isVectorVariableLengthType determines if a type requires explicit length serialization within a vector.
Variable-length types need their length encoded before the actual data to allow proper deserialization.
Fixed-length types, on the other hand, don't require this kind of length prefix.
Also, if there are no serialization requirements for fixed-length types, should TypeSmallInt and TypeTinyInt be removed from isVectorVariableLengthType?
| case TypeCustom: | ||
| switch elemType.(type) { | ||
| case VectorType: | ||
| vecType := elemType.(VectorType) | ||
| return isVectorVariableLengthType(vecType.SubType) | ||
| } | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we agreed on similar switch statement, let's then keep it consistent 🙏
ctx: #1828 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Changed to if statement like in other cases.
ribaraka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing job!
|
@joao-r-reis: Changed C* versions to run integration tests from 4.0 and 4.1, to 4.1 and 5.0. |
joao-r-reis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff
|
I'll wait for approvals by @tengu-alt and @OleksiienkoMykyta before merging this |
|
After they approve please squash the commits and use a commit description according to our guidelines |
|
@tengu-alt @OleksiienkoMykyta do you have time to review/approve this PR this week? |
Sure, I'll check it this week |
tengu-alt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, looks good to me!
OleksiienkoMykyta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the great contribution
|
@lukasz-antoniak can you squash and edit commit msg? |
|
Done. |
|
missing entry on |
Support marshalling and unmarshalling of vector custom type. patched by Lukasz Antoniak; reviewed by João Reis, Stanislav Bychkov, Oleksandr Luzhniy, Mykyta Oleksiienko and Bohdan Siryk for CASSGO-11
Fix #1734.
Complete vector type support for GoCQL. Supports fixed and variable-length vector types (e.g. vector of floats, vector of list of text).